Search CORE

13 research outputs found

HALLS: An Energy-Efficient Highly Adaptable Last Level STT-RAM Cache for Multicore Systems

Author: Adegbija Tosiron
Kuan Kyle
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/11/1968
Field of study

Spin-Transfer Torque RAM (STT-RAM) is widely considered a promising alternative to SRAM in the memory hierarchy due to STT-RAM's non-volatility, low leakage power, high density, and fast read speed. The STT-RAM's small feature size is particularly desirable for the last-level cache (LLC), which typically consumes a large area of silicon die. However, long write latency and high write energy still remain challenges of implementing STT-RAMs in the CPU cache. An increasingly popular method for addressing this challenge involves trading off the non-volatility for reduced write speed and write energy by relaxing the STT-RAM's data retention time. However, in order to maximize energy saving potential, the cache configurations, including STT-RAM's retention time, must be dynamically adapted to executing applications' variable memory needs. In this paper, we propose a highly adaptable last level STT-RAM cache (HALLS) that allows the LLC configurations and retention time to be adapted to applications' runtime execution requirements. We also propose low-overhead runtime tuning algorithms to dynamically determine the best (lowest energy) cache configurations and retention times for executing applications. Compared to prior work, HALLS reduced the average energy consumption by 60.57% in a quad-core system, while introducing marginal latency overhead.Comment: To Appear on IEEE Transactions on Computers (TC

arXiv.org e-Print Archive

The University of Nebraska, Omaha

A Survey of Phase Classification Techniques for Characterizing Variable Application Behavior

Author: Adegbija Tosiron
Criswell Keeley
Publication venue
Publication date: 16/07/2019
Field of study

Adaptable computing is an increasingly important paradigm that specializes system resources to variable application requirements, environmental conditions, or user requirements. Adapting computing resources to variable application requirements (or application phases) is otherwise known as phase-based optimization. Phase-based optimization takes advantage of application phases, or execution intervals of an application, that behave similarly, to enable effective and beneficial adaptability. In order for phase-based optimization to be effective, the phases must first be classified to determine when application phases begin and end, and ensure that system resources are accurately specialized. In this paper, we present a survey of phase classification techniques that have been proposed to exploit the advantages of adaptable computing through phase-based optimization. We focus on recent techniques and classify these techniques with respect to several factors in order to highlight their similarities and differences. We divide the techniques by their major defining characteristics---online/offline and serial/parallel. In addition, we discuss other characteristics such as prediction and detection techniques, the characteristics used for prediction, interval type, etc. We also identify gaps in the state-of-the-art and discuss future research directions to enable and fully exploit the benefits of adaptable computing.Comment: To appear in IEEE Transactions on Parallel and Distributed Systems (TPDS

arXiv.org e-Print Archive

Recommended from our members

MirrorCache: An Energy-Efficient Relaxed Retention L1 STTRAM Cache

Author: Adegbija Tosiron
Kuan Kyle
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Spin-Transfer Torque RAM (STTRAM) is a promising alternative to SRAMs in on-chip caches, due to several advantages, including non-volatility, low leakage, high integration density, and CMOS compatibility. However, STTRAMs' wide adoption in resource-constrained systems is impeded, in part, by high write energy and latency. A popular approach to mitigating these overheads involves relaxing the STTRAM's retention time, in order to reduce the write latency and energy. However, this approach usually requires a dynamic refresh scheme to maintain cache blocks' data integrity beyond the retention time, and typically requires an external refresh buffer. In this paper, we propose mirrorCache-an energy-efficient, buffer-free refresh scheme. MirrorCache leverages the STTRAM cell's compact feature size, and uses an auxiliary segment with the same size as the logical cache size to handle the refresh operations without the overheads of an external refresh buffer. Our experiments show that, compared to prior work, mirrorCache can reduce the average cache energy by at least 39.7% for a variety of systems.This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]

The University of Arizona

Design Space Exploration of Sparsity-Aware Application-Specific Spiking Neural Network Accelerators

Author: Adegbija Tosiron
Svoboda Ilkin Aliyev. Kama
Publication venue
Publication date: 25/10/2023
Field of study

Spiking Neural Networks (SNNs) offer a promising alternative to Artificial Neural Networks (ANNs) for deep learning applications, particularly in resource-constrained systems. This is largely due to their inherent sparsity, influenced by factors such as the input dataset, the length of the spike train, and the network topology. While a few prior works have demonstrated the advantages of incorporating sparsity into the hardware design, especially in terms of reducing energy consumption, the impact on hardware resources has not yet been explored. This is where design space exploration (DSE) becomes crucial, as it allows for the optimization of hardware performance by tailoring both the hardware and model parameters to suit specific application needs. However, DSE can be extremely challenging given the potentially large design space and the interplay of hardware architecture design choices and application-specific model parameters. In this paper, we propose a flexible hardware design that leverages the sparsity of SNNs to identify highly efficient, application-specific accelerator designs. We develop a high-level, cycle-accurate simulation framework for this hardware and demonstrate the framework's benefits in enabling detailed and fine-grained exploration of SNN design choices, such as the layer-wise logical-to-hardware ratio (LHR). Our experimental results show that our design can (i) achieve up to

76\%

reduction in hardware resources and (ii) deliver a speed increase of up to

31.25\times

, while requiring

27\%

fewer hardware resources compared to sparsity-oblivious designs. We further showcase the robustness of our framework by varying spike train lengths with different neuron population sizes to find the optimal trade-off points between accuracy and hardware latency

arXiv.org e-Print Archive